Skip to content

[Feature] Accelerate the comparison for batched#991

Merged
jan-janssen merged 14 commits into
mainfrom
fast_batched
Jun 10, 2026
Merged

[Feature] Accelerate the comparison for batched#991
jan-janssen merged 14 commits into
mainfrom
fast_batched

Conversation

@jan-janssen

@jan-janssen jan-janssen commented May 28, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • Refactor

    • Updated the batched futures API to accept skip lists as futures instead of pre-materialized lists.
  • Tests

    • Updated test suite to validate the revised skip list parameter handling.

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9718215e-8c2f-4f4c-aef6-d7c7946a2e43

📥 Commits

Reviewing files that changed from the base of the PR and between f71b624 and dbf6852.

📒 Files selected for processing (3)
  • src/executorlib/standalone/batched.py
  • src/executorlib/task_scheduler/interactive/dependency.py
  • tests/unit/standalone/test_batched.py

📝 Walkthrough

Walkthrough

batched_futures now accepts a list of Future[list] objects as the skip parameter instead of pre-materialized lists. The function dereferences these futures internally, builds a skip set using object identities, and adjusts batch sizing and filtering accordingly. The dependency scheduler and unit tests are updated to use the new signature.

Changes

Batched futures skip parameter refactoring

Layer / File(s) Summary
Batched futures API update
src/executorlib/standalone/batched.py
Function signature updated to accept nested_skip_lst: list[Future[list]]; implementation dereferences futures via result(), builds an internal skip_set by collecting id() of skipped items, and recalculates batch sizing and filtering logic to use the computed set.
Dependency scheduler integration
src/executorlib/task_scheduler/interactive/dependency.py
_update_waiting_task updated to pass nested_skip_lst directly from task_wait_dict["kwargs"]["skip_lst"] to batched_futures, deferring future dereferencing and set construction to the API function.
Tests updated for nested_skip_lst
tests/unit/standalone/test_batched.py
Both test_batched_futures and test_batched_futures_not_finished updated to construct Future objects for skip items and call batched_futures with nested_skip_lst parameter; assertions adjusted to match new API behavior.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • pyiron/executorlib#1013: Both PRs modify batched_futures' skip-filtering logic to build an internal set of skipped items' identifiers and use it to exclude already-skipped completed future results.

  • pyiron/executorlib#756: The main PR updates batched_futures's skip-list contract and adjusts _update_waiting_task's "batched" branch, directly modifying the batching implementation introduced in that PR.

Suggested reviewers

  • samwaseda

Poem

🐰 A futures dance, now deferred with grace,
Skip-sets bloom where futures embrace,
Nested lists become promises untouched,
Until result() is called—not too much,
Batched and bound in identity's place!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Accelerate the comparison for batched' is vague and does not clearly describe the actual change, which is converting skip_lst parameter from list to set for performance optimization. Consider a more descriptive title like 'Convert batched_futures skip parameter from list to set' or 'Use set-based lookup for batched_futures skip comparison' to better communicate the specific technical change.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fast_batched

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/executorlib/standalone/batched.py (1)

19-25: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Fix batched_futures: value-deduped skip_set can overestimate n_expected and leave later batches permanently unresolved when duplicate result values occur.

In src/executorlib/standalone/batched.py, n_expected = min(n, len(lst) - len(skip_set)), but eligibility is computed via v.result() not in skip_set (value-based). In src/executorlib/task_scheduler/interactive/dependency.py, skip_set is built as a deduplicated set of prior batch results, so duplicates in lst that were already consumed by an earlier batch still remove all remaining occurrences by value, while n_expected only subtracts once per distinct value.

When there aren’t enough remaining non-skipped values to fill the batch, batched_futures keeps returning [], and _update_waiting_task keeps the batch waiting forever.

Concrete example (single value repeated in the consumed batch):

  • lst results: [1, 1, 1, 2], n=3
  • Batch 1 returns [1, 1, 1]
  • Batch 2: skip_set={1}, so n_expected=min(3, 4-1)=3, but only the single future with result 2 is eligible → returns [] indefinitely.

Add tests with duplicate result values (and/or change the algorithm to track already-assigned counts or per-future identity rather than a value-deduped set).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/executorlib/standalone/batched.py` around lines 19 - 25, batched_futures
overestimates n_expected because skip_set is value-deduped; change
batched_futures (and its use of skip_set) to account for duplicate result values
by tracking counts or per-future identity: either accept a skip_counts Counter
(value -> number skipped) or build a skip_futures set of already-assigned
futures, then compute n_expected as min(n, count of futures in lst not excluded
by the skip tracking) and when collecting done results decrement the skip count
or mark that specific future as skipped; update references to skip_set,
n_expected, lst and done_lst accordingly and add tests for duplicate-result
scenarios.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/executorlib/task_scheduler/interactive/dependency.py`:
- Around line 349-353: The current batching builds a value-based skip_set by
flattening f.result() into a set which fails for unhashable batch outputs;
change to an identity/position-based skip key so membership checks don't require
hashing the full result. In
src/executorlib/task_scheduler/interactive/dependency.py replace the flattened
value set with a set of identity keys (e.g. {id(item) for f in
task_wait_dict["kwargs"]["skip_lst"] for item in f.result()} or a tuple of
(future_index, item_index) keys) and update the corresponding membership test in
src/executorlib/standalone/batched.py (the v.result() not in skip_set check) to
compare the same identity/position key (e.g. id(v.result()) or (v_future_index,
v_item_index)) instead of the raw value so unhashable batch items no longer
raise TypeError.

---

Outside diff comments:
In `@src/executorlib/standalone/batched.py`:
- Around line 19-25: batched_futures overestimates n_expected because skip_set
is value-deduped; change batched_futures (and its use of skip_set) to account
for duplicate result values by tracking counts or per-future identity: either
accept a skip_counts Counter (value -> number skipped) or build a skip_futures
set of already-assigned futures, then compute n_expected as min(n, count of
futures in lst not excluded by the skip tracking) and when collecting done
results decrement the skip count or mark that specific future as skipped; update
references to skip_set, n_expected, lst and done_lst accordingly and add tests
for duplicate-result scenarios.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9a5e9403-9861-440d-ac4c-96e84ef390c4

📥 Commits

Reviewing files that changed from the base of the PR and between ecdee72 and 6cf9104.

📒 Files selected for processing (2)
  • src/executorlib/standalone/batched.py
  • src/executorlib/task_scheduler/interactive/dependency.py

Comment thread src/executorlib/task_scheduler/interactive/dependency.py Outdated
@jan-janssen

Copy link
Copy Markdown
Member Author

@copilot fix the following two errors in the tests:

======================================================================
ERROR: test_batched_futures (unit.standalone.test_batched.TestBatched.test_batched_futures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/executorlib/executorlib/tests/unit/standalone/test_batched.py", line 13, in test_batched_futures
    self.assertEqual(batched_futures(lst=lst, n=3, skip_lst=[]), [0, 1, 2])
                     ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: batched_futures() got an unexpected keyword argument 'skip_lst'. Did you mean 'skip_set'?

======================================================================
ERROR: test_batched_futures_not_finished (unit.standalone.test_batched.TestBatched.test_batched_futures_not_finished)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/executorlib/executorlib/tests/unit/standalone/test_batched.py", line 23, in test_batched_futures_not_finished
    self.assertEqual(batched_futures(lst=lst, n=3, skip_lst=[]), [])
                     ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: batched_futures() got an unexpected keyword argument 'skip_lst'. Did you mean 'skip_set'?

----------------------------------------------------------------------

Copilot AI commented May 28, 2026

Copy link
Copy Markdown
Contributor

@copilot fix the following two errors in the tests:

...

Fixed in the latest commit. Updated the tests to use skip_set=set() / skip_set={...} (flat set) instead of skip_lst=[] / skip_lst=[[...]] to match the batched_futures function signature.

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Actionable comments posted: 0

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Actionable comments posted: 0

Updated versioning system from vcs-versioning to setuptools-scm and modified version details.
@jan-janssen jan-janssen marked this pull request as draft May 28, 2026 23:02
@codecov

codecov Bot commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.19%. Comparing base (fd8b9a9) to head (dbf6852).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #991   +/-   ##
=======================================
  Coverage   94.19%   94.19%           
=======================================
  Files          39       39           
  Lines        2103     2103           
=======================================
  Hits         1981     1981           
  Misses        122      122           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jan-janssen jan-janssen marked this pull request as ready for review June 10, 2026 17:55
@jan-janssen jan-janssen merged commit 65f1ef3 into main Jun 10, 2026
90 of 94 checks passed
@jan-janssen jan-janssen deleted the fast_batched branch June 10, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants